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Abstract. Markov chains in time, such as simple random walks, arc at the 
heart of probability. In space, due to the absence of an obvious definition of 
past and future, a range of definitions of Markovianity have been proposed. In 
this paper, after a brief review, we introduce a new concept of Markovianity 
that aims to combine spatial and temporal conditional independence. 



1. From Markov chain to Markov point process, and beyond 

This paper is devoted to the fundamental concept of Markovianity. Although its 
precise definition depends on the context, common ingredients are conditional in- 
dependence and factorisation formulae that allow to break up complex, or high 
dimensional, probabilities into manageable, lower dimensional components. Thus, 
computations can be greatly simplified, sometimes to the point that a detailed 
probabilistic analysis is possible. If that cannot be done, feasible, efficient simula- 
tion algorithms that exploit the relatively simple building blocks may usually be 
designed instead. 

1.1. Markov chains 

The family of Markov chains is one of the most fundamental and intensively studied 
classes of stochastic processes, see e.g. 0. If we restrict ourselves to a bounded time 
horizon, say 0,1, ... ,N for some N e N, a discrete Markov chain is a random vector 
(Xq, . . . , Xn) with values in some denumerable set L ^ for which 

F(Xi = Xl | X =x ;.. .\Xi-i = = ¥{Xt = x % | AVi = Xi-i) (1) 

for all 1 < % < N and all Xj S L, j = 0, . . . , i. In words, the Markov property 
means that the probabilistic behaviour of the chain at some time i given knowledge 
of its complete past depends only on its state at the immediate past i — 1, regardless 
of how it got to Xi-i. The right hand side of equation is referred to as the 
transition probability at time i from Xj-i to Xj. If P(X; = X{ | X;_i = = 
p(xi-\, Xi) does not depend on i, the Markov chain is said to be stationary. 
By the product rule, the joint distribution can be factorised as 
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Consequently, 

F(Xi = n | Xj = Xj ,j ^i) = ¥(X, = Xi | Xt-x = x l - 1 ,X l+1 = x i+1 ) (3) 

for all i = 1, . . . , N — 1, and all Xj £ L, j = 0, . . . , N (with obvious modifications 
at the extremes i = 0, N of the time interval of interest). Thus, the conditional 
distribution of the state at a single point in time depends only on the states at the 
immediate past and future. To tie in with a more general concept of Markovianity 
to be discussed below, the time slots i — 1, i + 1 (if within the finite horizon) may 
be called neighbours of i, i £ {0, . . . , N}. 

As a simple example, consider the celebrated simple random walk on the two 
dimensional lattice L = 1? . The dynamics are as follows: A particle currently 
sitting at site x £ L moves to each of the neighbouring sites with equal probability. 
More precisely, p(x, y) = 1/4 if x and y are horizontally or vertically adjacent in L, 
that is, ||ze — y\\ = 1, and zero otherwise |25j . 



1.2. Markov random fields 

The concept of Markovianity plays an important role in space as well as in time, 
especially in image analysis and statistical physics. Since, in contrast to the setting 
in the previous subsection, space does not allow a natural order, a symmetric, 
reflexive neighbourhood relation ~ must be defined on the domain of definition, 
which we assume to be a finite set I. The generic example is a rectangular grid 
I C Z 2 with i ~ j if and only if \\i — j\ \ < 1. Then, a discrete Markov random field 
with respect to ~ is a random vector X = (Xi)i e j, with values of the components 
Xi in some denumerable set L ^ 0, that satisfies the local Markov property 

P(A, = a | Xj = Xj J ± i) = P{X, = x t | Xj - Xj,j G 

where d(i) = {j £ I \ {i} : j ~ i} is the set of neighbours of i. The expression 
should be compared to 0; indeed a Markov chain is a Markov random field on 
I = {0, 1, . . . , N} with i ~ j if and only if |i - j| < 1, i, j £ /. 

To state the analogue of J5J), define a clique to be a set of sites C C I such that 
for all i,j € C these sites are neighbours, that is i ~ j. By default, the empty set 
is a clique. If we assume for simplicity that the joint probability mass function of 
the random vector X has no zeroes, it defines a Markov random field if and only if 
it can be factorised as 

F(X =x ;...;X N = x N )= JJ ipc^JeC) (4) 

cliques C 

for some clique interaction functions </?c(") > 0. Hence, the clique interaction func- 
tions are the spatial analogues of the transition probabilities of a Markov chain. 

A two-dimensional example is the Ising model for spins in a magnetic field [lij . 
In this model, each node site of a finite subset of the lattice is assigned a spin 
value from the set L = {— 1, 1}. Interaction occurs between spins at horizontally or 
vertically adjacent sites, so that cliques consist of at most two points. A particular 
spin value may be preferred due to the presence of an external magnetic field. More 
precisely, the Ising model is defined by (0} with (purfei) = e aXi for singletons, 
f{i,j\{ x ii x j) = e P x i x o t and ip$ set to the unique constant for which the right hand 
side of Q is a probability mass function. The constant a reflects the external 
magnetic field and influences the frequencies of the two spin types. For f3 > 0, 
neighbouring sites tend to agree in spin, for (3 < 0, they tend to have different 
spins. For further details, the reader is referred to 11]. 
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1.3. Markov point processes 

The next step is to leave discrete domains and move into Euclidean space. Thus, let 
D C M. d be a disc, rectangle or other bounded set of positive volume. A realisation of 
a Markov point process on D is a subset x = {xi, . . . , x n } of I? with random, finite 
n > 0. Again define a reflexive, symmetric neighbourhood relation ~, for instance 
by x ~ y if and only if ||x — y\\ < R for some R > 0. Note that each point has 
uncountably many potential neighbours, in contrast to the discrete case. Moreover, 
the probability of finding a point at any particular location a s D will usually 
be 0, so densities rather than probability mass functions are needed. A suitable 
dominating measure is the homogeneous Poisson process (see e.g. 01 )> because of 
its lack of spatial interaction. Indeed, given such a Poisson process places n points 
in D, the locations are i.i.d. and uniformly distributed over D. If desirable, one 
could easily attach marks to the points, for example a measurement or type label 
of an object located at the point. Doing so, the domain becomes D x M, where M 
is the mark set, and each Xi can be written as {di, m^). 

In this set-up, a Markov point process X is defined by a density /(•) that is 
hereditary in the sense that /(x) > /(y) > for all configurations y C x and 
satisfies ^illlil the local Markov condition, that is, whenever /(x) > 0, u £ x, the 
ratio 

f(xUW) 

' I x ) := Jtx) ( 5 ) 

depends only on u and {xi : u ~ Xi}. The function A(- | •) is usually referred to 
as conditional intensity. For the null set where /(x) = 0, A(- | x) may be defined 
arbitrarily. Note that since realisations x of X are sets of (marked) points, /(•) is 
symmetric, i.e. invariant under permutations of the Xi that constitute x. If 

/(xU{u})<JJ/(i) 

for some (3 > uniformly in x and u, the density /(•) is said to be locally stable. 
Note that local stability implies that /(•) is hereditary, and that the conditional 
intensity (JSJ is uniformly bounded in both its arguments. 

A factorisation is provided by the Hammersley-Clifford theorem which states 
[H that a marked point process with density /(•) is Markov if and only if 

/(x)= J] <p(y) (6) 
cliques yCx 

for some non-negative, measurable interaction functions </>(■) ■ The resemblance to 
Q is obvious. 

An example of a locally stable Markov point process is the hard core model, a 
homogeneous Poisson process conditioned to contain no pair of points that are closer 
than R to one another. For the hard core model, ip({x,y}) = l{\\x — y\\ > R}, 
ip(y) = 1 for singletons and configurations y containing three or more points. The 
interaction function for the empty set acts as normalising constant to make sure 
that /(•) integrates to unity. 

Although Markov point processes are useful modelling tools , the assumption 
of permutation invariance may be too restrictive. For instance in image interpre- 
tation, occlusion of one object by another may be dealt with by an ordering of 
the objects in terms of proximity to the camera |2dl |. or in scene modelling var- 
ious kinds of alignment of the parts that make up an objects may be modelled 
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by means of non-symmetric neighbour relations [22J. Non-symmetric densities also 
arise naturally when local scale and intensity is taken into account by transforming 
the conditional intensity of a homogeneous template process, see |l4j . 

In stark contrast to the vast literature on spatial point processes, sequential 
patterns have not been studied much. Indeed, an analogue to the theory of Markov 
point processes does not exist in the non-symmetrical setting. The present paper 
fills this gap. 



2. Definitions and notation 



This paper is concerned with finite sequential spatial processes. Realisations of such 
processes at least consist of a finite sequence 

x = (xi,...,x n ), n e N 

of points in some bounded subset D of the plane. Additionally, to each point Xi, 
a mark m, in some complete, separable metric space, say M, may be attached. 
The mark may be a discrete type label, a real valued measurement taken at each 
location, or a vector of shape parameters to represent an object located at Xi. Thus, 
we may write 

y = (y%, ■■■,y n ) = ((xi,mi), . . . , (x n ,m„)), n e N , 

for the configuration of marked points, and shall denote the family of all such 
configurations by 7V f . 

As an aside, the plane M 2 may be replaced by R d or any other complete separable 
metric space, equipped with the Borel cr-algebra and a finite diffuse Borel measure. 

The distribution of a finite sequential spatial process may be defined as follows. 
Given a finite diffuse Borel measure fi(-) on (D, Bd) so that /i(-D) > 0, usually 
Lebesgue measure, and a mark probability measure ^m(') on M equipped with its 
Borel cr-algebra M, specify 

(1) a probability mass function q n , n £ No, for the number of points in D; 

(2) for each n, a Borel measurable and (// x /j,m )"-integrable joint probability 
density p n {yi, . . . , y n ) for the sequence of marked points y\, . . . , y n £ D x M, 
given it has length n. 

Alternatively, a probability density /(•) may be specified directly on N { = W^L (Dx 
M) n , the space of finite point configurations in D with marks in M, with respect 
to the reference measure v{-) defined by v(F) equal to 



n=0 ™! Jdxm Jdxm 



1 {(Vu ■■■,Vn) G F}a![ix hm(vi) ■■■dfxx HM{y n ) 



for F in the cr-algebra on finite marked point sequences generated by the Borel 
product CT-fields on (D x M) n . In words, v(-) corresponds to a random sequence 
of Poisson length with independent components distributed according to the nor- 
malised reference measure //(■) Hm(-)/ fJ-(D). 

It is readily observed that qo = exp(— /x(D)) /(0), and 

f(yi, ■ ■ -,y n )dn x HM(yi) ■■■dfix ^M(y n ); 

DxM J DxM 



II 



e -l*(D) 

Pn(y) = — : — /(y) 

n\q n 
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for each n G N and y G (D x M)". 

Reversely, if the length n(y) of y is n, 

Note that neither /(•) nor the p n (-, . . . ,■) are required to be symmetric 0, Ch. 5]. 

We are now ready to define and analyse a Markov concept for random sequences 
in the plane. To do so, we begin by defining a sequential conditional intensity 

Xidx, m) | y) := j- - - if /(y) > (7) 

for inserting (x,m) y at position i G {1, . . .,n + 1} of y = (yi, . . . ,y n ). Here 
Si(f, (x,m)) = (yi, . . .,Vi-i, (x,m),yi, . . .,y n ). On the null set {y G N l : f(y) = 
0}, the sequential conditional intensity may be defined arbitrarily. The overall con- 
ditional probability of finding a marked point at du — d/i x /jm(«) in any position 
in the vector given that the remainder of the sequence equals y is given by 

n+1 

^2 Xi{du | y). 
i=l 

The expression should be compared to its classic counterpart |J5J. As for Markov 
chains, we are mostly interested in A„+i(- | y), but all A^(- | •) are needed for the 
reversibility of the dynamic representation to be considered in Section 

Note that provided /(•) is hereditary in the sense that /(y) > implies /(z) > 
for all subsequences z of y, then 

n 

f(yi, . . . ,y n ) oc n \J[h{yi\y<i) (8) 

where y <s ; = (yi, . . . , yi—i). This observation implies an alternative way of defining a 
sequential spatial process: by its conditional intensity. Some care has to be taken to 
make sure that the resulting density JSJ is integrable with respect to A sufficient 
condition is that (yi | y<i) < (3/i for all choices of i, yi, and y<,. It is important 
to note, though, that such an assumption does not imply that f{sj(y,u))/f(y) is 
uniformly bounded in u and y for j ^ n(y) + 1. The latter stronger assumption, 
which may be referred to as local stability in analogy with classic marked point 
processes, will be required in Section 0] 

In order to generalise the concept of Ripley-Kelly Markovianity 0, |2|| to ran- 
dom sequences, suppose a reflexive relation ~ on D x M is given with the purpose 
of formalising the local interactions. In contrast to the point process context, we do 
not require ~ to be symmetric. If y ~ z, the marked point z is said to be a directed 
neighbour of y. 

Definition 1. A sequential spatial process Y on a bounded Borcl set D CM. 2 with 
marks in M defined by its density /(•) with respect to v(-) is said to be Markov 
with respect to the relation ~onDx M if 

• /(y) > implies /(z) > for all subsequences z of y; 

• for all sequences y for which /(y) > 0, the ratio /((y, u))/f(y) depends only 
on u and its directed neighbours {yi G y : u ~ yi} in y. 
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In particular, A n (jn_|_i(- | y) is invariant under permutations of y . In the sequel, 
we shall sometimes abuse notation somewhat and write X n (y)+i(' | y), where y — 
{Vi £ y}i an d ^(y) is the cardinality of the set y. 

Due to the lack of symmetry, sequential spatial processes are particularly useful 
for modelling inhomogeneous space-time processes. For instance, to obtain a com- 
plete packing of non- intersecting particles [2^, a typical algorithm keeps adding 
particles randomly until this becomes impossible due to violation of the condition 
of no overlap, see e.g. the recent review papers |6 L[27l |. A related soft-core model 
for landslides was proposed by Fiksel and Stoyan [2j ■ 

Example 1 (Simple sequential inhibition). Let 7r( ) be a probability density 
with respect to Lebesgue measure on a bounded planar Borel set D of strictly 
positive area. Suppose a population of animals arrives in the region D to set up 
nests, and would like to pick locations according to 7r(-). In other words, those 
subregions that have high 7r-mass are deemed the most suitable. However, as each 
animal needs its space, a newly arriving animal may not build its nest closer than 
some r > to an existing nest location 0,0- Thus, if we write d(z, x) = min{||z — 
x S x} for the minimal distance between zefl and the components of x, 



x 



Pn (^1 j ■ ■ • j *£n) 

k , , Tr(x 2 ) 1 {d(x 2 ,xi) > r} Ti(x n ) 1 {rf(x», x<„) > r} ^ 

J D tt(z) 1 {d(z, x\) > r} dz J D ir(z) 1 {d(z, x <n ) > r} dz 

for x.i e i — 1, . . . , n. Some care has to be taken about division by zero. If n is 
larger than the packing number for balls of radius r m D, or q n — for some other 
reason, p n {-, ■ ■ ■ , •) may be chosen to be any probability density on D n . For smaller 
n, or if q n > 0, set p n {x\, . . . , x n ) — whenever some term in the denominator of 
JSJ is zero, and renormalise p n {-, •••,•) to integrate to unity. The total number of 
animals in D is either fixed - as in the formulation by - or random according 
to some probability mass function q n , n < n p , the packing number, as in Random 
Sequential Adsorption |rj E^. 

If n is fixed, that is q n = 1 {n = no} for some uq G N, /(x) is not hereditary, 
therefore not Markovian in the sense of Definition ^ If Qn > for n < n p , and 
Xm=o In = lj the density is hereditary. In that case, provided f(x%, . . . , x n ) > 0, 

, , 1 / \ \ c n+1 q n+1 tt(u) l{d(u,x) > r} 

K+l(U I (Xl, . . . , Xn)) = 



c n q n f D 7r(z)l{d(z,x) > r} dz 

(with 0/0 = 0), where c n is the normalising constant of p n (•,...,•). The like- 
lihood ratio is invariant under permutations of the components of x. However, 
it may depend on the sequence length through c n +iq n +i / {c n qn) , and, moreover, 
J D 7r(z) 1 {d(z, x) > r} dz may depend on the geometry of x, i.e. on the whole se- 
quence. Hence, in general, /(•) is Markovian only with respect to the trivial relation 
in which each pair of points is related. 

Example 2 (Sequential soft core). Further to Example^ suppose the animals 
have no locational preferences, but claim territory within a certain radius. The 
radius can depend deterministically on the location 14] (in fertile regions, less 
space is needed than in poorer ones), be random and captured by a mark, or a 
combination of both . To be specific, suppose an animal settling at x G D claims 

1 Assume the mark space (0, 00), for concreteness, is equipped with a density <?(•). By the trans- 
formation hr cliC2 } (x, m) = (cix, C2m) for c\,ci > 0, the Lebesgue intensity measure /i(-) is scaled 
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the region within a (stochastic) radius m > of x for itself, and that newcomers are 
persuaded to avoid this region. An appropriate density with respect to v(-) could 
be 

™(y) 

/(y) oc (3 n ^ J] 7 £i« i{lki-x 3 -||<m j}) 
i=l 

y 3 yi = (xi, m,) G -D x (0, oo). Here /3 > is an intensity parameter, and < 7 < 1 
reflects the strength of persuasion, the smaller 7, the stronger the inhibition. Indeed, 
for 7 = 0, no new arrival is allowed to enter the territory of well-settled animals. 
Alternatively, if invaders demand space according to their own mark, replace rrij 
by rrii in the exponent of 7. 
Note that 

(n(y) + 1) \ nm+1 ((x,m) | y) = (3^P^ i{ll*-*ill<»V> 

depends only on those yj = [xj, rrij) for which ||:r — Xj\ \ < rrij. Hence /(•) is Markov 
with respect to the relation (a;, r) ~ (y, s) ^ \\x — y\\ < s. 



3. Hammersley Clifford factorisation 

The goal of this section is to formulate and prove a factorisation theorem for se- 
quential spatial processes in analogy with and JSJ. 

To do so, we need the notion of a directed clique interaction function. Indeed, for 
a marked point z G D X M, define the sequence y to be a ^-clique with respect to a 
reflexive relation ~ on D x M if y either has length zero or all its components y G y 
satisfy z ~ y. The definition is z-directed but otherwise permutation invariant, so 
we may map y onto the set y G N l , the family of unordered finite marked point 
configurations, by ignoring the permutation. 

Our main theorem is the following. 

Theorem 1. A sequential spatial process with density /(•) is Markov with respect 
to ~ if and only if it can be factorised as 

n 

/(yi } ...,y„) = /(0)n II ^' z ) ( 10 ) 

i=l zC y<i 

for some non-negative, jointly measurable interaction function tp : (D x M) x N l — > 
[0, 00) that vanishes except on cliques (i.e. (p(u, z) = 1 ifz is no u-clique with respect 
to ~j. Here y <l = {y x , y l -i}. 

by c^ 2 on ciD, the new mark density is c^ x g(mjc2). Let c(-, ■) be a scaling function with compo- 
nent functions Ci(-, ■), i = 1, 2, such that < c < c;(a:, m) < c < 00 uniformly. Let V be a template 
marked point process with hereditary density /(■), conditional intensity A(- | ■), and define a se- 
quential conditional intensity with respect to dii c (x,m) = ci(x,m)~ 2 C2(x,m)~ 1 g(m/c2(x,m)) 

by A n ((x,m) I y) := A ((^j, ^y) I ^ , m) (y)) /(n(y) + 1). Provided /(■) is lo- 

cally stable, the sequential density fY c {') defined by Ay c (- | ■) as in JSJ is wcll-dcfincd. 
For example, the marked hard core density /(y) oc Jl 1 {]\xi — Xj\\ > rrii + rrij} with 
\{(x,m) I y) = — Xi\\ > m + rrii for all (xi,rrii) £ y} is transformed into fy {y) oc 

J^Ij^ 1 i ] l^'i — s j 1 1 > ( m i + m j ) } ■ Intuitively, the resulting process Y c looks like a scal- 

ing of Y at (x,m) by c(x,m). Indeed, if Y is Markov with respect to ~, then Y c is Markov with 
respect to (u, r) ~ c (v , s) ( — — r, — £ — r) ~ ( — 7^ — r, — A — r) and the ^-neighbourhood 
d c (x,m) = h c ( x rn )(d(h~^ m ^(x, m))) inherits from d(-) geometric properties such as convexity 
that are invariant under rescaling. 
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Proof. To show that any density of the form IjlOjl is Markovian, suppose /(y) > 
but /(z) =0 for some subsequence z of y. Then, again with the notation n(z) for 
the length of z, 

mil n ^. x )=o- 

i— 1 xCz<i 

Hence either /(0) = or some term ip(zi,x) = 0, but, as both feature in the 
factorisation IjlOjl of /(y) too, /(y) must be as well, in contradiction with the 
assumption. Assume that /(y) > 0. Now, /((y, u))/f(y) — IlzCy^l^' 2 ) depends 
only on u and its directed neighbours in y, as ip(u, z) = 1 whenever z is no u-clique. 

Reversely, set ^(j/1,0) := /(yi)//(0) = Ai(yi|0), and define putative inter- 
action functions <p(-, •) recursively as follows. If {yi,...,y„} is no y n +i-clique, 
PiVn+i, {yi, ■ ■ -,yn}) ■= 1; else 

where the product ranges over all strict subsets {yi, ...,t/„}^zC {yi, . . . , y n }. To 
deal with any zeroes, we use the convention 0/0 = 0. 

We first show that (p(-, •) is a well-defined interaction function. By the Markov as- 
sumption, /(j/i, . . . , y n +i)//(j/i, . . . , y n ) is invariant under permutations of yi, . . . , 
y n , so a simple induction argument yields the permutation invariance of yj(j/ n +i, 
{j/i, . . . , in its second argument. By definition, (p(-, •) vanishes except on cliques, 
hence it is an interaction function. To show that if the denominator of Ijlljl is zero, so 
is the numerator, note that if /(0) = 0, by the assumption that the process is hered- 
itary, necessarily / = which contradicts the fact that /(■) is a probability density. 
Therefore /(0) > and y(yi,0) is well-defined. Suppose <£>(•,•) 1S well-defined for 
sets of cardinality at most n — 1 > as its second argument. Let {yi, . . . ,y n } 
be an y ra+ i-clique. The Markov assumption implies that if f(yi, ■ ■ ■ ,y n ) = 0, also 
/(t/i, . . . , y n+ \) is zero. Furthermore, if /(yi, . . . ,y n ) > but ^(y„+i, z) = for 
some strict subset z C {yi, . . . , y n }, by the induction assumption, /((z, y n +i)) = 0, 
where the sequence z is obtained from z by the permutation induced by (yi, . . . , y n ). 
A fortiori f(yi, . . . ,y n +i) = 0. We conclude that Ijlljl is well-defined by induction. 

It remains to show that <^(-, •) satisfies the desired factorisation. To do so, we 
again proceed by induction. By definition, f(yi) = /(0) <p(yi, 0) for any y\ S Dx M, 
so the factorisation holds for sequences of length at most 1. Suppose (jlOjl holds 
for all sequences that are at most n > 1 long, and consider any sequence y = 
(yi, . . . , y n +i) with components in D x M. If f(yi, . . . , y n ) = 0, by the assumption 
on hereditariness, f(yi, ■ ■ ■ , y n +i) = 0. By the induction hypothesis, f(yi, ■ ■ ■ , y n ) = 

/(0)nr=in z c y<i ^,z) = o which im P h es /(0)nr=i 1 n z c y< ^(y,^) = o = 

f(yi, . . . , J/n+i)- Hence without loss of generality we may assume that f(yx, ■ ■ . , y n ) 
is strictly positive. Then 

/(yi,...,y„+i) = -—; — /(yi, . . . ,y n ) 

f(yu ■■■,yn) 

- fi ^-- yn i imfi n ^.-) (12) 

J(,yi,---,J/nJ i=lzc y<1 

because of the induction hypothesis. We shall distinguish several cases. 

Firstly, suppose that {yi, ■ ■ ■ ,y n } is an y„ + i-clique and recall that if the de- 
nominator in the right hand side of Ijlljl is zero, then so are {yii ■ ■ ■ tVu}) 
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and f(yi, ■ ■ ■ ,y n +i), so the desired factorisation holds. If, on the other hand, the 
denominator is strictly positive, by definition 



f(yi 



f{yi, ■ 



■Vn 



sC{j/i, ...,!/„} 



(13) 



and {TUJ) holds by substitution of (JT3J in (JT2J. 

Secondly, assume {yi,...,y n } is no y n+ i-clique. Then, y n+1 ^ y i for some y i; 
i = 1, . . . , n. Now, by the Markov assumption, /(j/i, . . . , $/»— i, ■ ■ ■ , y n ) > 0, and 

A„+i(t/„+i | {yi,---,yn}) = A„+i(y n+ i | {yi, . . . ,yi-i,y i+1 , . . . ,y n }) 

i f(yi,---,Vi-i,Vi+u---,y n +i) 



n + l f(yi, 
1 



,y%-i,y%+i, 



n + 1 



n 



• • • , Vn) 

tp(y n+1 ,z) 



n + 1 



*C{»l,"-)I/<-l.»<+li-"»»n} 

]J ^(2/n+l» z ) 
zC{i/i,...,j/„} 



where the last two equations follow from the induction hypothesis and the fact that 
ip(y n+ i,z) = 1 whenever z contains yi. We conclude that l|13|) holds, hence IjlUI) . □ 

The factorisation l|l(J|) is similar to that of the joint distribution of a Markov 
chain, cf. (J2J. Indeed, the product of transition probabilities p(xi-i,Xi) from state 
Xi-x to Xi is replaced by a product of interaction functions n z c y<i ^(Vii 2 ) over 
cliques of 'past' neighbours of yi similar to JSJ. 

Theorem^provides a third way of defining a sequential process: by specifying its 
clique interaction functions. Of course one must verify intcgrability for each choice. 

Example 1 (ctd). Recall that we need to assume q n > for n < n p and q„ = 
for n > n p in order to be sure that the simple sequential inhibition model © is 
hereditary. 

Under this assumption, the interaction functions can be computed iteratively by 
Set r n := nc n q n /(c n -iq n -x) for n = 1,2, ...,n p , otherwise, and /(x) := 
J D ir(z) 1 {d(z, x) > r} dz for any finite set x of points in D, with 7(0) = 1. Then 



<p{x,y) = l{d(x,y) > r} exp 



£(_l)»(y\«) fo g 

zCy 



'k(z) + 1 

7(z) 



for non-empty configurations y with 7(y) > 0. Otherwise, ip(x,y) = 0. 

In order to verify the above expressions, note that by equation (JHjl, it is sufficient 
to verify that the sequential conditional intensity, or equivalently the likelihood ratio 
for adding a point at the end of a given sequence, has the desired form. Indeed, for 
x = (x\, . . . , x n ), n > 1, such that 7(x) and /(x) are strictly positive, 



/(a n+ i(x, x n+1 )) 
/(x) 



= n 7r(x n+ i) exp 

_ r n +i ir(x n +i) 
7(x) 



0^yCxzCy 
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provided d{x n +x, {x%, . . . , x n }) > r, which implies d(x n+ i,y) > r for any subset 
y of x = {xi, . . . ,x n }, and zero otherwise. The last identity is a consequence of 
Newton's binomium. 

Example 2 (ctd). For the sequential soft core model, 

<p(v,Q) = P; 

ip((x 1 ,m 1 ),(x 2 ,m 2 )) = 7 i{lki-x a ||<m a } j 

and 1 otherwise 2 . 

Example 3. Pairwise interaction models of the form 



/(0) II 



i=l 



J<1 



are particularly convenient. Note that the sequential soft core model discussed in 
the previous example exhibits pairwise interactions only. It has a constant penalty 
term 7 6 [0, 1) for each pair j/j ~ yj. It may be more natural to let the interaction 
vary with distance, for example quadratically as in 

tp((x,r),(y,s))=l-(l-\\x-y\\ 2 /Rl s f 

for I |x — 2/|| < R r ^ s . Here the range of interaction R r s may depend on the marks. 
A sufficient condition for the model to be integrable is that tp(y, 0) is uniformly 
bounded in y and the pairwise interaction function tp(yi, yj) is bounded by 1. Thus, 
pairwise interaction models are particularly suitable for modelling repulsion be- 
tween marked points. 



4. Dynamic representation 

Markov sequential spatial processes arise naturally as the limit distribution of a 
jump process with transitions that add or remove one component of the se- 
quence at a time. From a statistical point of view, it is then possible to obtain 
samples from some sequential spatial model of interest by running the pure jump 
process into equilibrium. Standard Markov chain Monte Carlo ideas can then 
be applied to estimate model parameters by the maximum likelihood principle, to 
perform likelihood ratio tests to assess the goodness-of-fit of the model, to compute 
confidence regions, profile likelihoods and so on. Moreover, at least in principle, 
it is possible to determine exactly when equilibrium is reached, by appl ying the 
coupling ideas of Propp and Wilson along the lines proposed in [si llrl Ha| and 
the references therein. 

Below, we first propose a birth-and-death jump process, then develop a discrete 
time Metropolis-Hastings style sampler. 

2 The density of the scaled process Y c of Example 2 can be factorised as 

^ \c(Vi)( Z ^ U ^ ci(Hi) ' e 2 («j))} J lf dcnotes tne interaction function 
of a template Markov marked point process Y. 
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4-1. Spatial birth- and- death processes 

A spatial birth- and- death process is a continuous time Markov process with state 
space N ! . Its only transitions are the insertion of a marked point in the current 
sequence (a birth) , or the deletion of a component (a death) . Suppose the current 
state is y , and write bi (y, u) dfi x /j,m (u) for the birth rate of a new marked point 
in du, u e D x M, to be inserted at position i of y, and di(y) for the death rate of 
the reverse transition. Then the detailed balance equations are given by 

j— /(y) h(f, u) = - — — tt /(si(y, u)) di(si(y, u)). 

n\ (n + 1)! 

If we take unit death rate for each marked point, 

, ,^ , fjsjjy, u)) 

My ' u)= (n + D/(y) (U) 

again with the convention 0/0 = 0. Note that if the jump process starts in the set 
{y £ JV f : /(y) > 0}, it will almost surely never leave it. Clearly, for this choice of 
birth and death rates, /(•) is an invariant probability density. 

Since our process evolves in continuous time, some care has to be taken to avoid 
explosion, that is, infinitely many transitions in finite time. Set 

n+l . 

B„ := sup y~] / bi(f,u)diix p, M (u) 

n(y)=n i=1 J DxM 



for the upper bound on the total birth rate from sequences of length n > 0, and 
note that 

n 

5 n := inf Y]di(y) = n. 

n(y)=n z — * 
^— 1 

By H3, Prop. 5.1, Thm. 7.1], sufficient conditions for the existence of a unique 
jump process with birth rates l|14H and unit death rates with a unique invariant 
probability measure to which it converges in distribution regardless of the initial 
state are that 5 n > for all n > 1 and one of the following holds: 

• B n = Q for all sufficiently large n > uq > 0; 

• B n > for all n > 1 and 

Si---S n ^Bi---B„_i 

oo; } — — < oo. 



^ B\ ■ ■ ■ B n ^-i Si ■ ■ ■ 6 n 



For densities /(•), such as the sequential soft core model of Example 2, that 
satisfy 

/(si(y») <Pf(y) (15) 

for some (3 > and all y G N , u € D X M , i = 1, . . . , n(y) + 1, the total birth rate is 
bounded by (3 fi(D), and it is easily verified that the above Preston conditions hold. 
Moreover, from a practical point of view, the jump process may be implemented by 
thinning the transitions of a process with unit death rate and constant birth rate 
&i(y, •) = P/(n(y) + 1), which avoids having to compute explicitly the parameter 



*(y)+i 

E 



bi (y , u) dfi x p, M (u) + n(y) 

DxM 
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of the exponentially distributed waiting times in between jumps. The retention 
probability of a transition from y to s$ (y , u) is given by 

ti r- \\ 1 "(^ 



/3/(y) r 

The first term depends on the directed neighbours yjj of u (u ~ in y<;. The 
interaction functions in the last product reduce to 1 if yj ^ u. 



4-2. A Metropolis-Hastings sampler 

A popular, flexible and generally applicable technique for generating samples from a 
complex or high-dimensional probability density is the Metropolis-Hastings method, 
see e.g. ^(j- Briefly, the method works by proposing an update according to a 
distribution that is convenient to sample from, and then to accept or reject this 
proposal with a probability that is chosen so as to make sure the detailed balance 
equations are satisfied. 

In the context of this paper, two types of proposals are considered: births and 
deaths. More precisely, given the current state is y, with probability 1/2 propose 
a birth, otherwise a death. In the first case, select a position i to insert and a new 
marked point u uniformly on 1, . . . ,n(y) + 1 respectively D x M (w.r.t. /i x (1m), 
and accept the new state Si (y , u) with probability 

f(8j(f,u))n(D) ] 
' /(y) (n(y) + 1) J ' 

In case a death is proposed, if y is the empty sequence, do nothing; otherwise, select 
the position of the marked point to be removed uniformly, say i, and accept the 
transition with probability 

^ „ v . L f{9{-i))n{y)\ 

a(y ' y( - )):=mm r' mm / 

where y*(— £) denotes the subsequence of y obtained by removing the i th component. 
It is easily seen that /(•) is an invariant density. If we start the chain in a sequence 
y for which /(y) > 0, the chain will almost surely never leave the set H := {y G 
N f ■ /(y) > 0}. We shall restrict the Markov chain to H from now on. 

In order to show that the Metropolis-Hastings chain Y n ,n G No, converges to /(•) 
in total variation from any initial state in H , we need to establish aperiodicity and 
Harris recurrence, that is Harris ergodicity [2l|. Aperiodicity immediately follows 
from the fact that self-transitions occur. A sufficient condition for Harris recurrence 
(in fact for the even stronger property of geometric ergodicity) is that there exist a 
function V : N l P\H — > [1, oc), constants b < oo and 7 < 1, and a measurable small 
set C C N l n H such that 

PV(y) :=E[V(X n+1 )\X n =y-} < jV(y) + bl {y £ C} . (16) 

Recall that a set C is small if there exists a non-zero measure <j>{-) and an integer 
n such that the probability that the Metropolis-Hastings chain reaches any mea- 
surable subset B of 7V f from any y e C in n steps is at least as large as v(B). See 
21] for further details. 



«(y, Si(j,u)) := min 
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From now on, assume that (|15l) holds. Then, the drift condition (|16|) can be 
verified by an adaptation of the proof of [121 Prop. 3.3] with V(y) = A n ^ y ' for some 
A > 1. Indeed, note that the acceptance probability for inserting u G D x M at 
position i in y is bounded by f3fi(D)/(n(y) + 1), which does not exceed a prefixed 
constant e > if n(y) is sufficiently large. Similarly, the acceptance probability for 
removing a component from the sequence y reduces to 1 if y is long enough. Now, 

1 r a- i 

2 W M I) )Hy) + 1 ) 

+ 5 E ^J-y 1 *-*>) + viS). (17) 

Since for sufficiently long sequences of marked points, say n(y) > iV c , deaths are 
always accepted and a(y, Sj(y, u)) < e uniformly in its arguments, l|17|) is less 
than or equal to [i (A — 1) e + | (A -1 — 1) + l] V(y). This and the uniform lower 
bound l/(/3/x(D)) on the acceptance probability for deaths yield the desired result 
by the same arguments as in the proof of Propositions 3.2-3.3 in |l2j with C — 
u£l (D x M) n nH. 



4-3. Reversible jump Markov chains 

The Metropolis-Hastings algorithm of the previous subsection is a special case of a 
so-called reversible jump Markov chain 13] . This is a proposal-acceptance / rejection 
scheme for moves between subspaces of different dimension. Suppose a target equi- 
librium distribution and a proposal probability for each move type are given. The 
aim is to define acceptance probabilities in such a way that the resulting Markov 
chain is well-defined and the detailed balance equations hold. In order to avoid 
singularities due to the different dimensions, for each move type, a symmetric dom- 
inating measure £(•, •) on the product of the state space with itself is needed. It turns 
out [13j that acceptance probabilities that are ratios of the joint density of the tar- 
get and proposal distributions with respect to £(•, •) evaluated at the proposed and 
current state do the trick. 

In the context of Section FOl define, for measurable subsets A and B of N , 

Z(AxB) = I l{y^A}— \— 

Jn* n \y) + 1 

n(y)+l 

x ^2 1 i s dy-. u ) ^ B} dfix fi M (u) dv(y) 

1=1 DxM 

n. ™(y) 
+ / l{yeA}^l{y H) 65} du(y). 

The measure is symmetric by the form of the reference measure v (•). 

The joint bi-variate density of the product of the distribution to sample from 
and the birth proposal distribution with respect to £(•, •) is given by /(y, Si(y, u)) = 
f(y)/( 2 K D ))- Similarly for death proposals we have /(y,y(- t) ) = /(y)/(2n(y)). 
Hence, the acceptance probabilities for births and deaths, as given in the previous 
section, follow as ratios of joint bi-variate densities truncated at 1. Note that there 
is no need to worry about division by zero, as we had restricted the Metropolis- 
Hastings chain to the family of sequences having strictly positive density. 
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